Predicting successful draft outcome in Australian Rules football: Model sensitivity is superior in neural networks when compared to logistic regression

Using logistic regression and neural networks, the aim of this study was to compare model performance when predicting player draft outcome during the 2021 AFL National Draft. Physical testing, in-game movement and technical involvements were collected from 708 elite-junior Australian Rules football players during consecutive seasons. Predictive models were generated using data from 465 players (2017 to 2020). Data from 243 players were then used to prospectively predict the 2021 AFL National Draft. Logistic regression and neural network models were compared for specificity, sensitivity and accuracy using relative cut-off thresholds from 5% to 50%. Using factored and unfactored data, and a range of relative cut-off thresholds, neural networks accounted for 73% of the 40 best performing models across positional groups and data configurations. Neural networks correctly classified more drafted players than logistic regression in 88% of cases at draft rate (15%) and convergence threshold (35%). Using individual variables across thresholds, neural networks (specificity = 79 ± 13%, sensitivity = 61 ± 24%, accuracy = 76 ± 8%) were consistently superior to logistic regression (specificity = 73 ± 15%, sensitivity = 29 ± 14%, accuracy = 66 ± 11%). Where the goal is to identify talented players with draft potential, model sensitivity is paramount, and neural networks were superior to logistic regression.


Introduction
The identification of characteristics that predict sporting talent is important in recruiting and developing future elite-level performers [1].Talent identification is a process that recognises characteristics in players that are congruent with expert performance, while recruitment is the process undertaken to acquire the desired talent [1][2][3].Performance in team sport begins with the periodic recruitment of identified talent and is then determined by the complex interplay and combination of physical, technical, and tactical elements [4].Identifying the strengths and weaknesses of individual players that impact draft success can inform recruitment, player development and training decisions [5][6][7].
Within Australian Rules football (AF), recruitment occurs informally through scouting of the junior participation pathway into the elite-junior talent pathway and formally through a yearly structured draft system in the elite-senior Australian Football League (AFL).Similar to other draft-based sports like Basketball [8] and American Football [9], talented juniors and off-contract players are invited to be part of the AFL's National Draft in November of each year.Therefore, the primary objective of the elite-junior talent pathway is to provide AFL teams with new recruits who possess desirable attributes.Although being drafted does not guarantee a successful career in the AFL, draft success is a critical step in the talent pathway.Consequently, the potential exists to use data from the elite-junior talent pathway to reduce subjectivity in the selection process [5][6][7].
Predictive modelling has been used extensively in other sports and is now more common both in the literature and its application within AF player recruitment, particularly the draft system [5,6,10].Authors have investigated combinations of physical, in-game movement, and match performance variables that are associated with early (�5 year) career success and contract renewals within the AFL [11,12].Physical testing performance has been used in elite-junior populations to predict selection level or playing status [5][6][7], and in-game movement and a small number of key technical involvements have been used to predict draft outcome [13,14].In a more recent study, authors used anthropometric, physical testing, and in-game movement variables to generate factors and investigate associations with draft success.While results yielded excellent accuracy and specificity, the sensitivity, or the ability to characterise a player with draft potential, was less convincing.Including anthropometric, physical testing, and in-game movement improved the overall accuracy in existing models; however, the relatively poor ability to predict those players who will be drafted limits the application of these models in the recruitment process.
The existing predictive models in AF use traditional statistical techniques like multiple linear regression or logistic regression more suited to linear data and simple relationships, often with poor or unreported sensitivity [15].If the primary purpose of these models is to distinguish the high performers from the rest (i.e., drafted versus not-drafted), sensitivity, or the correct classification of true positives should be considered with high priority.Artificial neural networks are becoming more widely used within sport, given their ability to identify trends in complex, realworld datasets that are non-linear in nature, where data are often not normally distributed [16].With applications in prediction and classification tasks, neural networks operate in a manner that mimics the functionality of the human brain and its decision-making capacity by analysing the interaction of numerous complex variables [15].For example, neural networks have been employed in soccer with performance variables from matches in the English Championship used to predict a players career trajectory by determining if they would be promoted to a higher league, continue in the current league, or play in lower leagues in future (78.8% success rate) [15].
The ability to use routinely collected elite-junior AF player data to predict draft success with high sensitivity would be useful to recruiters, clubs, coaches and players.Therefore, using logistic regression and neural networks, the aim of this study was to compare model performance when predicting player draft outcome during the 2021 AFL National Draft.

Materials and methods
Physical testing, in-game movement (from Global Positioning Systems; GPS), and technical involvement data were collated from 708 elite-junior male Australian Rules football players competing in the under 18 boys NAB League competition during five consecutive seasons (2017 to 2021).Data from the 2017 to 2020 seasons were made available at the conclusion (Sep 30 th ) of the 2020 competitive season.Data from the 2021 season were made available again at the conclusion of the competitive season allowing sufficient time to process prior to the National Draft on 24 th November.Data were organised to only include those participants who were eligible (18 th year) in each respective year.Data from eligible players (n = 465; drafted = 90, not-drafted = 375) who competed in the 2017 to 2020 seasons were used to construct logistic regression and neural network models with draft outcome (drafted or notdrafted) being the binary response variable.Player data from the 2021 season (n = 243) were run through the previously constructed models to prospectively predict draft success prior to the 2021 draft.Model performance was then analysed post.Access to archived data were granted by the Australian Football League.Institutional ethics approval with waived participant consent was granted by Latrobe University Human Ethics Committee (ref: HEC20065).
Physical testing outcomes were determined in March prior to the commencement of each season, and included: stature (cm), reach (cm), body mass (kg), vertical jump (cm), running vertical jump left (RVJL; cm), running vertical jump right (RVJR; cm), 20-m sprint (s), AFL Agility (s) and the Yo-Yo Intermittent Recovery Test (Estimated _ VO 2 max; ml�kg -1 �min -1 ).Estimated _ VO 2 max was used to maintain consistency with the previous literature from which the logistic regression models were generated.Time (s) recorded in the 20-m sprint test was converted to an average speed (m�s -1 ) and the time recorded to complete the AFL Agility test was subtracted from 10 s to ensure a faster time on both tests was represented by a larger number [4].
In-game movement data were collected at 10 Hz using GPS (Optimeye X4/S5; Catapult Innovations, Melbourne, Australia) as a standard procedure during 331 matches (5,240 appearances; mean = 13 ± 7 appearances per player).GPS variables assessed were softwarederived and included field time (min), total distance (m), relative distance (total distance/field time [m�min -1 ]), high speed running (HSR) efforts and sprint efforts.The velocity thresholds used for HSR and sprint efforts were 4.00 to 5.99 m�s -1 and �6.00 m�s -1 , respectively.
Technical involvement data were collected by an external provider (Champion Data TM , Melbourne, Australia) as a standard procedure during the same 331 matches and made available for analyses in their raw format with associated timestamps and player name.Variables were grouped for analyses to include relative involvements (n�min -1 ), relative disposals (n�min -1 ), relative possessions (n�min -1 ), relative pressure acts (n�min -1 ), and relative positive involvements (n�min -1 ).
Players were assigned specific positions by coaches during physical testing.For analysis purposes, players were then assigned to an all-position group and three positional groups (nomadic, fixed, fixed&ruck).Due to their small sample (n = 15), ruckmen were combined with fixed-position players to form the fixed&ruck group.Ruckmen have comparable positional roles and physical attributes to fixed-position players [17].Variables were collected from physical, GPS and technical data.To limit the impact of highly correlated variables and reduce the number of covariates, factor analysis using principal components analysis with oblique rotation was performed prior to logistic regression on all available variables (Version 26 IBM SPSS Statistics for Windows; IBM Corp, Armonk NY, USA).Underlying latent factors were identified using loading scores, which were then used as covariates in logistic regression models [4].Variables that did not load on one specific factor were treated as their own covariate.For comparative purposes, logistic regression was also performed using unfactored data.
Evolutionary algorithms were used to define the architecture, optimisation methods, and parameters for the neural network models in a customised software (Analysis and Recommendation Engine; PhysiGo Ltd., Wiltshire, England).The evolutionary component of the analysis process incorporated four parameters for building the network (number of layers, number of neurons in hidden layers, layer connectivity and their respective weights).The parameters that defined the training optimisation method were back propagation, Bayesian regulated optimisation and gaussian random optimisation.For back propagation parameters included learning rate, momentum terms, squashing terms, convergence terms, and a fitness value (χ 2 ) to compare models.Parameters were encoded in a gene (eDNA).Retrospective data were randomly split into three subsets, as follows: a third is used to optimise the neural network given the parameters defined by the evolutionary algorithms, a third is used only to optimise the evolutionary algorithm based on the result of the neural network, a third to independently test the model on unseen data.This network design ensures no overfitting and avoids local minima and bias introduction.During model development, an eDNA string was selected from one of several gene pools (tribes) to build and optimise the neural network.It then used the eDNA to run the defined optimisation technique on a model and evaluate the eDNA's fitness.Fitness was then used to steer further model selection.Gene pools and tribes were set to compete against each other in a parallel modelling environment.Neural networks used the area under the curve of a Receiver Operator Characteristics curve to evaluate true/ false positives/negatives in a confusion matrix.
For each respective model, players were ranked on their probability of being drafted in Microsoft Excel (Microsoft Corporation, Washington, USA).To evaluate the relative performance of models, cut-off thresholds ranging from 5% to 50% were used to predict 2021 draft outcome by player and data configuration in each positional group.Players ranked within the respective cut-off thresholds were allocated the binary outcome of drafted, and players ranked outside the cut-off thresholds were allocated the binary outcome of not-drafted.Following the draft, confusion matrices were constructed using the predicted and observed outcomes.Model performance was evaluated using specificity (number of true not-drafted/number of all notdrafted), sensitivity (number of true drafted/number of all drafted) and accuracy (number of correct assessments/number of all assessments).

Results
Of the 243 players in the 2021 draft for which data were available, 38 (16%) were drafted and 205 (84%) were not-drafted.Fig 1 shows the distribution of drafted players per quartile by model and analysis type.In comparison with logistic regression, neural networks correctly predicted a greater proportion of drafted players in the first quartile (61 ± 6% vs. 41 ± 15%) and in the first two quartiles combined (85 ± 10% vs. 68 ± 17%).Factors identified from principal components analysis and used in modelling are presented in S1 Table .Listwise deletion was applied in logistic regression where cases had one or more missing data points (n = 22), which was not the case with neural networks.Consequently, prospective predictions were performed on all-position (logistic: n = 214; neural networks: n = 243), nomadic (logistic: n = 160; neural networks: n = 182,), fixed (logistic: n = 39; neural networks: n = 46), and fixed&ruck (logistic: n = 54; neural networks: n = 61).
Table 2 presents the model performance specifically at the 15% and 35% drafted cut-off thresholds.The selected cut-off thresholds of 15% and 35% were chosen because the proportion of observed drafted players per position group was ~15%, and the 35% cut-off threshold represents the most common first instance of convergence (i.e., intersection) of specificity and sensitivity, observed in 8 of 12 cases (Fig 1).In total, Table 2 presents 48 models with 24 comparisons.At the 15% and 35% cut-off thresholds, neural networks correctly classified a greater proportion of drafted players than logistic regression in 21 of the 24 comparisons.At the 15% cut-off threshold, neural networks (mean = 37.3%) had greater sensitivity than logistic regression models (mean = 29.3%).At the 35% cut-off threshold, neural networks (mean = 71.3%)had greater sensitivity than logistic regression models (mean = 53.4%).

Discussion
This is the first study in Australian Rules football to systematically collect multifactorial, league-wide data over consecutive seasons and predict draft outcome.Thirty-eight players in which data were available were drafted.Logistic regression equations and neural networks were used, with resultant specificity, sensitivity and accuracy used to compare model performance.When reporting the best performing model for each positional group and data configuration, 73% of those were neural network models.Further, neural networks correctly classified a greater proportion of drafted players than logistic regression.However, when variables were factored, the difference in performance for logistic regression and neural networks across cut-off thresholds was trivial and overall performance at optimal thresholds were similar.Conversely, when data were not factored, performance of neural networks was consistently superior to logistic regression.Neural networks were better than logistic regression at identifying drafted players in the first quartile and also in the first two quartiles combined (85 ± 10%).Comparative Receiver Operator Characteristics curves were used to compare model performance across a range of relative cut-off thresholds (5% to 50%).In 8 of the 12 model comparisons presented in Fig 1, the specificity and sensitivity curves intersected at lower cut-off thresholds in neural networks.Meaning, at any cut-off threshold before the point of intersection, neural networks correctly classified more true positive and true negative outcomes than logistic regression at the same cut-off threshold.In the same 8 comparisons, model accuracy (number of correct assessments/ number of all assessments) also remained higher than logistic regression after this point of intersection.
Neural network models had greater sensitivity than logistic regression, whereas logistic regression had exceptionally high specificity.Model sensitivity is important when the recruiters' aim is to identify players possessing characteristics that differentiate them from the majority of other players (i.e., draft potential).A model with high sensitivity is successfully classifying a high proportion of players that are actually drafted.Conversely, high specificity signifies a model that correctly classifies a high proportion of players that are not-drafted.As model accuracy reflects both sensitivity and specificity, and because not-drafted players (NAB League = 85%) significantly outweigh drafted players (NAB League = 15%), a model with high sensitivity and specificity is more desirable than a model with low sensitivity and very high specificity, where the overall accuracy is comparable.
These outcomes have implications in two important draft scenarios.Teams are allocated draft picks based on their ladder position the previous season, and any trades made internally between clubs.Approximately 120 players are drafted each year [18].Clubs with early draft picks need to confidently (sensitivity) identify the talent that will immediately enhance their playing list.Smaller cut-off thresholds (e.g., 15%) with no positional bias would be useful to recruiters as neural networks in this instance are correctly classifying up to 15 drafted NAB League players.Conversely, as the draft progresses, clubs may opt for larger cut-off thresholds (e.g., 35%) or position-specific models that identify the talent to fill certain positional voids in their current playing list.At the 35% cut-off threshold, neural networks are correctly classifying 22/29 drafted nomadic players and 7/9 drafted fixed&ruck players.
The addition of a technical factor to logistic regression models resulted in improved model performance when comparing factored (Phys&GPS and Phys&GPS&Tech) models.When individual variables were used, performance was negatively influenced.This result is as expected given the limitations of logistic regression when dealing with multiple variables, many of which may be highly correlated (e.g., VJ and RVJL or RVJR) [19].In contrast, draft prediction using neural networks was less effective using factored data and more effective when using individual variables.If modelling is undertaken for the purpose of training prescription, logistic regression using factored data provides insight on the impact of generalised There are a number of limitations that must be acknowledged within this study.Firstly, while positional analyses were used for exploratory purposes in this study, classifying players into positional groups can be problematic given that individuals can play multiple positions throughout the season, especially in elite-junior competition.Positional outcomes from this study should therefore be viewed with caution.Second, it is acknowledged that the data used in this study is solely from Victorian based teams.While it is assumed that data from other state-based talent competitions would be similar when making recommendations, this is not supported by findings.Third, when inspecting the data, the 95% CI of neural networks are somewhat larger than logistic regression.While neural network mean sensitivity values in particular are generally higher, the CI's often overlap logistic regression results, meaning conclusions must be taken with caution.Fourth, the models presented have been developed using drafted and not-drafted data outcome data from previous seasons.Consequently, any decision to recruit a player based on outcomes from these models indicates a similarity to the type of player that has been drafted in the past and does not consider the success of the player after being drafted.As more data becomes available, future modelling could use a measurement of career success as the dependent variable.Finally, the absence of psychosocial variables within these analyses must be acknowledged.Researchers have identified the importance of certain psychological characteristics that can impact a career in elite sport.Characteristics deemed important include self-confidence, drive or motivation, commitment, mental toughness and resilience [1,18].It is now common practice to assess these characteristics through interviewing procedures within the AFL draft and if made available, could be considered within future analyses [3].
Collectively, the findings from this study provide justification for the application of predictive modelling for both retrospective and prospective classification tasks like the AFL draft.If domain knowledge exists, the recommendation would be to use all available variables and neural networks for model development and prediction.If not, factored data and logistic regression offers an alternative solution, but these models have greater variability in predicting drafted players across positions and models.Recruiters could use data from multiple seasons to iteratively train neural network models and periodically predict draft outcome to add weight to their subjective selections.The process of predicting draft likelihood with this method might additionally identify players that had not previously been considered, or highlight trends in player performance that could aid or hinder their transition into the elite-senior game.For example, an upward trending athlete that did not feature in mid-season predictions, but did feature in post-season predictions, might be a more desirable draft pick than a downward trending player.Talent pathway coaches could use this information to objectively identify players with "current" draft potential and ensure they are developed to continue on that trajectory.Alternatively, identifying players who are ranked close to the cut-off threshold, and variables that they are deficient in, could inform targeted development for an individual player to improve their chances of being drafted.

Conclusion
This is the first study to systematically collect multifactorial, league-wide data over consecutive seasons and prospectively predict draft outcome in elite-junior Australian Rules football.
Where the goal of analysis is to identify talented players with draft potential, model sensitivity is paramount, and neural networks generally outperformed logistic regression.Employing neural networks removes the need to factor data and resulted in superior classification of talented individuals when compared to logistic regression models.If logistic regression is to be used, data should be factored, and results applied with more caution.Future research could further develop these predictive models by including psychosocial characteristics, and players from other elite-junior talent pathway competitions.

Table 2 . Model performance by position at the 15% and 35% drafted cut-off thresholds.
that influence draft success.However, specific domain knowledge is required to determine the number and relatability of the selected input factors.Conversely, neural networks can include more variables across a broad range of factors (e.g., S1 Table)and offer a more streamlined workflow by removing the need to perform factorisation. capacities